Untitled

Bayesian Algorithm

Bayesian spam filtering is one of the more common features used today in spam filters to identify spam. It requires manual intervention as a user would train the function as to what is spam. The tool uses the concept of Bayes theorem and so using probabilities to weigh a message.

By comparing large sets of legitimate e-mail and large sets of spam, bayesian spam filter can then look for combination of words that are statistically likely to occur in spam messages, and for words that are statistically likely to occur in legitimate messages, to determine the probability that an e-Mail is likely to be spam or a legitimate e-mail.

After bayesian has scored a sufficient amount of emails, a user will be required to keep an eye on the email logs, to see how well bayesian has learnt to classy emails. Fine tuning may be required if bayesian is scoring incorrectly. When the system is scoring emails correctly, it is recommended to leave bayesian alone. Sometimes over scoring will do more harm than good. When bayesian starts to produce more false positives or/and false false negatives than usual, then it would be required to manually score a subset of emails, and fine tune the Bayesian system again. Sometimes if bayesian is not performing well, it may be required to reset bayesian and start again. Occasionally the emails scored by a user are vague (meaning an email looks legitimate and spammy at the same time) and can confuse the system.

Alongside bayesian, vendors also produce spam signatures for known spam emails. Spam signatures will block known spam emails, and if its an unknown spam email then it may be caught by bayesian or other filtering strategies. As with all security it is defense in depth that is required to have more of a complete security policy. Products such as MIMEsweeper and Barracuda spam filters have layers of defense mechanisms. The chances are one of the many layers, will catch even rare spam messages.